Genomic insights into the endangered white-eared night heron (Gorsachius magnificus)

Objectives A genome sequence of a threatened species can provide valuable genetic information that is important for improving the conservation strategies. The white-eared night heron (Gorsachius magnificus) is an endangered and poorly known ardeid bird. In order to support future studies on conservation genetics and evolutionary adaptation of this species, we have reported a de novo assembled and annotated whole-genome sequence of the G. magnificus. Data description The final draft genome assembly of the G. magnificus was 1.19 Gb in size, with a contig N50 of 187.69 kb and a scaffold N50 of 7,338.28 kb. According to BUSCO analysis, the genome assembly contained 97.49% of the 8,338 genes in the Aves (odb10) dataset. Approximately 10.52% of the genome assembly was composed of repetitive sequences. A total of 14,613 protein-coding genes were predicted in the genome assembly, with functional annotations available for 14,611 genes. The genome assembly exhibited a heterozygosity rate of 0.49 heterozygosity per kilobase pair. This draft genome of G. magnificus provides valuable genomic resources for future studies on conservation and evolution.

the first order of the National Key Protected Wild Animal List in China [7].
The G. magnificus is not well understood due to its solitary, nocturnal, and cryptic behaviour [4,8].Previous studies on the G. magnificus primarily focused on documenting its distribution in new areas [2,4,9].These findings expanded our knowledge of the G. magnificus' range and resulted in its threat category being changed from Critically Endangered to Endangered by IUCN in 2000 [10].Some researchers have suggested further downgrading its threat status because the G. magnificus has been observed in almost the entire southern region of China [11].However, ecological niche modelling predicted that suitable habitats for the G. magnificus are limited and scattered within the mountain chains of southern China [12].It also suggested that future climate change could alter its distribution and pose a threat to its population.Therefore, the authors recommended exercising caution

Objectives
The white-eared night heron (Gorsachius magnificus) is a medium-sized ardeid bird that distributes in tropical and subtropical moist lowland forests in southern and southwestern China [1][2][3], northern Vietnam [4], and northeastern India [5].Due to its small and fragmented population, the G. magnificus is currently listed as an Endangered (EN) species by the International Union for the Conservation of Nature (IUCN) [6] and is listed in when considering any downgrading of the threat status for the G. magnificus.
The genetic information of a threatened species is valuable for various aspects of their conservation biology, including estimating effective population size, inbreeding level, genetic diversity, and population structure [13].However, up until now, very little genetic information has been available for G. magnificus, except for the sequence of complete mitochondrial DNA [14].In this study, we sequenced the genome of the G. magnificus and evaluated genetic diversity using heterozygous single nucleotide polymorphisms (SNPs) at the whole genome level.The genetic diversity of the G. magnificus can help to correctly assess its conservation status.Additionally, the draft genome sequence can facilitate future conservation biology studies to help protect this endangered species.

Data description
A muscle sample of a dead G. magnificus was provided by the Jiulingshan National Reserve, which was found and confiscated by the reserve personnel from a local farmers' market in Jing'an Town (115.11'25" E, 28.69' 21''N), Yichun City, Jiangxi province, China in 2007.The muscle tissue was stored at -80 ℃ after collection.This research was approved by the Ethics Committee for Animal Experimentation of the Xiamen University.Genomic DNA was extracted using the QIAGEN® Puregene Tissue Core Kit A (Qiagen, Beijing, China), following the manufacturer's instructions.Two short insert libraries (230 and 500 bp) were prepared using the Illumina TruSeq DNA Library Preparation Kit (Illumina, San Diego, USA), while three mate pair libraries (2, 5, and 10 kb) were constructed using the Nextera Mate Pair Sample Preparation Kit (Illumina, San Diego, USA).The libraries were sequenced on the Illumina HiSeq 2500 sequencing platform at Novogene (Beijing).The sequencing depths were 30.30× and 26.10× for the two short insert libraries, and 16.05×, 11.26× and 9.90× for the three mate pair libraries, respectively.Raw reads were filtered using Cutadapt [15] and Trimmomatic [16] to remove adapters and low-quality reads (quality score < 20), respectively.The resulting clean reads were assembled using SOAPdenovo2 following the official guidelines [17].The estimated genome size was determined using 17 k-mers with Jellyfish [18].A total of 123.14 Gb of clean sequence data was obtained, and the estimated genome size was approximately 1.32 Gb based on the 17 K-mer distribution.The final genome assembly had a length of 1.19 Gb, with contig and scaffold N50 sizes of 187.69 kb and 7,338.28kb, respectively (Table 1, Data file 1).The longest scaffold was 29,902.37 kb.The completeness of the genome assembly was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.1.3[19] with the Aves (odb10) dataset.The analysis showed that 97.49% of the BUSCO genes (8103 single-copy and 25 duplicated) were complete, 0.56% were fragmented (47 genes), and 1.95% were missing (163 genes) (Table 1, Data file 2).
In conclusion, the draft genome sequence of G. magnificus can serve as a valuable resource for future studies on the genetic mechanisms underlying adaptations and the estimation of important parameters relevant to conservation efforts.

Limitations
Due to the difficulty of obtaining fresh tissues, this draft genome assembly was generated using short-read shotgun sequencing.Consequently, the final genome assembly size was smaller than the K-mer estimation, suggesting that the assembly contained a degree of fragments and gaps.In the future, if fresh tissues could be collected, long-read sequencing technologies, such as Pacific BioSciences (PacBio) or Oxford Nanopore sequencing, would help to improve the completeness and accuracy of the genome assembly.

Table 1
Overview of data files/data sets